Prof. Dr. Eric Bodden

Implementing an intra-procedural data-flow analysis in Soot

Eric | September 22, 2008

After my last tutorials on using Soot on the command line, and using the Soot Eclipse plugin, this is the third of a series of blog posts about frequently asked questions with using Soot. Today’s topic will be on extending Soot with your own intra-procedural data-flow analysis.

What is an intra-procedural data-flow analysis anyway?

The Wikipedia page describes it quite alright: an intra-procedural data-flow analysis operates on a control-flow graph – called UnitGraph in Soot – for a single method. A unit graph contains statements as nodes, and there is an edge between two nodes if control can flow from the statement represented by the source node to the statement represented by the target node.

A data-flow analysis associates two elements with each node in the unit graph, usually two so-called flow sets: one in-set and one out-set. These sets are (1) initialized, then (2) propagated through the unit graph along statement nodes until (3) a fixed point is reached.

In the end, all you do is inspect the flow set before and after each statement. By the design of the analysis, your flow sets should tell you exactly the information you need.

Forward, backward or branched?

There exist three different kind of FlowAnalysis implementations in Soot:

ForwardFlowAnalysis – this analysis starts at the entry statement of a UnitGraph and propagates forward from there,
BackwardsFlowAnalysis – this analysis starts at the exit node(s) of a Unit Graph and propagates back from there (as an alternative you can produce the InverseGraph of your UnitGraph and use a ForwardFlowAnalysis; it does not matter); and
ForwardBranchedFlowAnalysis – this is essentially a forward analysis but it allows you to propagate different flow sets into different branches. For instance, if you process a statement like if(p!=null) then you may propagate the into “p is not null” into the “then” branch and “p is null” into the ”else” branch.

What direction you want to use depends entirely on your analysis problem. In the following we assume that we want to do a forward analysis.

Implementing the analysis interface

To implement a forward flow analysis, we subclass ForwardFlowAnalysis. A good example of what this could look like is GuaranteedDefsAnalysis.

Type parameters

ForwardFlowAnalysis is a generic class with two type parameters:

N: The node type. Usually this will be Unit but in general you are also allowed to have flow analyses over graphs that hold other kind of nodes.
A: The abstraction type. Oftentimes people use sets or maps as their abstraction but in general you can use anything you like. Beware though, that your abstraction type must implement equals(..) and hashCode() correctly, so that Soot can determine when a fixed point has been reached! (wait)

Constructor

You have to implement a constructor that at least takes a DirectedGraph<N> as an an argument (where N is your node type; see above) and passes this on to the super constructor. Also, you should call doAnalysis() at the end of your constructor, which will actually execute the flow analysis. Between the super constructor call and the call to doAnalysis() you can set up your own analysis data structures.

newInitialFlow() and entryInitialFlow()

The method newInitialFlow() must return an object of your abstraction type A. This object is assigned as the initial in-set and out-set for every statement, except the in-set of the first statement (respectively an exit statement, e.g. return or throw, if you are implementing a backwards analysis) of your unit graph. The in-set of the first statement is initialized via entryInitialFlow()

copy(..)

The copy(..) method takes two arguments of type A (your abstraction), a source and a target. It merely copies the source into the target. Note that the class A has to provide appropriate methods. In particular, A may not be immutable. You can work around this limitation by using a box or set type for A.

merge(..)

The merge(..) method is used to merge flow sets at merge points of the control-flow, e.g. at the end of a branching statement (if/then/else). Opposed to copy(..) it takes three arguments, an out-set from the left-hand branch, an out-set from the right-hand branch and another set, which is going to be the newly merged in-set for the next statement after the merge point.

flowThrough(..)

The last method one needs to implement is the method flowThrough(..). This method implements your actual flow function. It takes three elements as inputs:

An in-set of type A.
The node that is to be processed. This is of type N, i.e. usually a Unit.
An out-set of type A.

The content of this method again depends entirely on your analysis and abstraction.

That’s it really. Again, a good example is GuaranteedDefsAnalysis. To see all flow-analysis implementations in Soot, browse all sub-classes of FlowAnalysis.

Debugging your flow analysis in Eclipse

It is very easy to debug your flow analysis using the Soot Eclipse plugin. First install the plugin.

Then, next go to File –> New –> Example…

Select A simple BodyTransformer and assign a project name, e.g. MyFlowAnalysis. Implement your subclass of ForwardFlowAnalysis (or Backward… or Branched..) as stated above. Assume that we call it LocalMustNotAliasAnalysis in our case. You project structure should now look like shown on the right-hand side.

Modify MyMain.java so that it instantiates your analysis. In our case, the main method looks like this:

public static void main(String[] args) {
  PackManager.v().getPack("jtp").add(
    new Transform("jtp.myTransform", new BodyTransformer() {
      protected void internalTransform(Body body, String phase, Map options) {
        new LocalMustNotAliasAnalysis(new ExceptionalUnitGraph(body));
      }
    }));
  soot.Main.main(args);
}

This hooks up the analysis into the Jimple Transformation Pack of Soot.

Next we would like to launch your analysis within Eclipse. To do so, we need a test project which we can apply the analysis to. Create a test project with a simple class that you wish to analyze.

Next, right-click on the TestClass and select Soot –> Process Source File –> Run Soot…

In the General Options select the Interactive Mode.

On the tab Soot Main Class enter the name of your driver class, e.g. MyMain, and the Eclipse project that contains this class, e.g. MyFlowAnalysis. Then, select Run.

Soot will display the unit graph for the method that your are analyzing. You can step through the graph as usual. Along with each node, the plugin shows you the in-set and out-set. Soot determines the text that appears here by calling toString() on your abstraction object (type variable A).